智能论文笔记

BibleTTS: a large, high-fidelity, multilingual, and uniquely African speech corpus

Josh Meyer , David Ifeoluwa Adelani , Edresson Casanova , Alp Öktem , Daniel Whitenack Julian Weber , Salomon Kabongo , Elizabeth Salesky , Iroro Orife , Colin Leong , Perez Ogayo

分类：自然语言处理

2022-07-07

Bibletts是一种在撒哈拉以南非洲使用的十种语言的大型，高质量的开放语音数据集。该语料库包含每语言最多86个小时的对齐，工作室质量的48kHz单扬声器唱片，从而能够开发高质量的文本到语音模型。代表的十种语言是：Akuapem Twi，Asante Twi，Chichewa，Ewe，Hausa，Kikuyu，Lingala，Luganda，Luganda，Luo和Yoruba。该语料库是由Biblica的Open.Bible Project制作和发行的圣经录音的衍生作品。我们已经对齐，清洁和过滤了原始录音，并还对每种语言的对齐子进行了手工检查。我们为具有Coqui TTS的文本到语音模型提供了结果。数据是根据商业友好的CC-SA许可发布的。

translated by 谷歌翻译

YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone

Edresson Casanova , Julian Weber , Christopher Shulby , Arnaldo Candido Junior , Eren Gölge , Moacir Antonelli Ponti

分类：自然语言处理

2021-12-04

YOUTTS为零拍摄多扬声器TTS的任务带来了多语言方法的力量。我们的方法在VITS模型上构建，并为零拍摄的多扬声器和多语言训练增加了几种新颖的修改。我们实现了最先进的（SOTA）导致零拍摄的多扬声器TTS以及与VCTK数据集上的零拍语音转换中的SOTA相当的结果。此外，我们的方法可以实现具有单扬声器数据集的目标语言的有希望的结果，以低资源语言为零拍摄多扬声器TTS和零拍语音转换系统的开放可能性。最后，可以微调言论不到1分钟的言论，并实现最先进的语音相似性和合理的质量。这对于允许具有非常不同的语音或从训练期间的记录特征的讲话来合成非常重要。

translated by 谷歌翻译

Procedural Generalization by Planning with Self-Supervised World Models

Ankesh Anand , Jacob Walker , Yazhe Li , Eszter Vértes , Julian Schrittwieser , Sherjil Ozair , Théophane Weber , Jessica B. Hamrick

分类：机器学习 | 人工智能

2021-11-02

基于模型的强化学习的关键承诺之一是使用世界内部模型拓展到新颖的环境和任务中的预测。然而，模型的代理商的泛化能力尚不清楚，因为现有的工作在基准测试概括时专注于无模型剂。在这里，我们明确测量模型的代理的泛化能力与其无模型对应物相比。我们专注于Muzero（Schrittwieser等，2020），强大的基于模型的代理商的分析，并评估其在过程和任务泛化方面的性能。我们确定了一个程序概括规划，自我监督代表学习和程序数据分集的三个因素 - 并表明通过组合这些技术，我们实现了普通的最先进的概括性和数据效率（Cobbe等人。，2019）。但是，我们发现这些因素并不总是为Meta-World中的任务泛化基准提供相同的益处（Yu等人，2019），表明转移仍然是一个挑战，可能需要不同的方法而不是程序泛化。总的来说，我们建议建立一个推广的代理需要超越单任务，无模型范例，并朝着在丰富，程序，多任务环境中培训的基于自我监督的模型的代理。

translated by 谷歌翻译

ChatGPT Makes Medicine Easy to Swallow: An Exploratory Case Study on Simplified Radiology Reports

Katharina Jeblick , Balthasar Schachtner , Jakob Dexl , Andreas Mittermeier , Anna Theresa Stüber , Johanna Topalis , Tobias Weber , Philipp Wesp , Bastian Sabel , Jens Ricke

分类：自然语言处理 | 机器学习

2022-12-30

The release of ChatGPT, a language model capable of generating text that appears human-like and authentic, has gained significant attention beyond the research community. We expect that the convincing performance of ChatGPT incentivizes users to apply it to a variety of downstream tasks, including prompting the model to simplify their own medical reports. To investigate this phenomenon, we conducted an exploratory case study. In a questionnaire, we asked 15 radiologists to assess the quality of radiology reports simplified by ChatGPT. Most radiologists agreed that the simplified reports were factually correct, complete, and not potentially harmful to the patient. Nevertheless, instances of incorrect statements, missed key medical findings, and potentially harmful passages were reported. While further studies are needed, the initial insights of this study indicate a great potential in using large language models like ChatGPT to improve patient-centered care in radiology and other medical domains.

translated by 谷歌翻译

Investigating Sindy As a Tool For Causal Discovery In Time Series Signals

Andrew O'Brien , Rosina Weber , Edward Kim

分类：机器学习

2022-12-29

The SINDy algorithm has been successfully used to identify the governing equations of dynamical systems from time series data. In this paper, we argue that this makes SINDy a potentially useful tool for causal discovery and that existing tools for causal discovery can be used to dramatically improve the performance of SINDy as tool for robust sparse modeling and system identification. We then demonstrate empirically that augmenting the SINDy algorithm with tools from causal discovery can provides engineers with a tool for learning causally robust governing equations.

translated by 谷歌翻译

Periocular Biometrics: A Modality for Unconstrained Scenarios

Fernando Alonso-Fernandez , Josef Bigun , Julian Fierrez , Naser Damer , Hugo Proença , Arun Ross

分类：计算机视觉

2022-12-28

Periocular refers to the region of the face that surrounds the eye socket. This is a feature-rich area that can be used by itself to determine the identity of an individual. It is especially useful when the iris or the face cannot be reliably acquired. This can be the case of unconstrained or uncooperative scenarios, where the face may appear partially occluded, or the subject-to-camera distance may be high. However, it has received revived attention during the pandemic due to masked faces, leaving the ocular region as the only visible facial area, even in controlled scenarios. This paper discusses the state-of-the-art of periocular biometrics, giving an overall framework of its most significant research aspects.

translated by 谷歌翻译

Push-the-Boundary: Boundary-aware Feature Propagation for Semantic Segmentation of 3D Point Clouds

Shenglan Du , Nail Ibrahimli , Jantien Stoter , Julian Kooij , Liangliang Nan

分类：计算机视觉

2022-12-23

Feedforward fully convolutional neural networks currently dominate in semantic segmentation of 3D point clouds. Despite their great success, they suffer from the loss of local information at low-level layers, posing significant challenges to accurate scene segmentation and precise object boundary delineation. Prior works either address this issue by post-processing or jointly learn object boundaries to implicitly improve feature encoding of the networks. These approaches often require additional modules which are difficult to integrate into the original architecture. To improve the segmentation near object boundaries, we propose a boundary-aware feature propagation mechanism. This mechanism is achieved by exploiting a multi-task learning framework that aims to explicitly guide the boundaries to their original locations. With one shared encoder, our network outputs (i) boundary localization, (ii) prediction of directions pointing to the object's interior, and (iii) semantic segmentation, in three parallel streams. The predicted boundaries and directions are fused to propagate the learned features to refine the segmentation. We conduct extensive experiments on the S3DIS and SensatUrban datasets against various baseline methods, demonstrating that our proposed approach yields consistent improvements by reducing boundary errors. Our code is available at https://github.com/shenglandu/PushBoundary.

translated by 谷歌翻译

A C++ Implementation of a Cartesian Impedance Controller for Robotic Manipulators

Matthias Mayr , Julian M. Salt-Ducaju

分类：机器人

2022-12-21

Cartesian impedance control is a type of motion control strategy for robots that improves safety in partially unknown environments by achieving a compliant behavior of the robot with respect to its external forces. This compliant robot behavior has the added benefit of allowing physical human guidance of the robot. In this paper, we propose a C++ implementation of compliance control valid for any torque-commanded robotic manipulator. The proposed controller implements Cartesian impedance control to track a desired end-effector pose. Additionally, joint impedance is projected in the nullspace of the Cartesian robot motion to track a desired robot joint configuration without perturbing the Cartesian motion of the robot. The proposed implementation also allows the robot to apply desired forces and torques to its environment. Several safety features such as filtering, rate limiting, and saturation are included in the proposed implementation. The core functionalities are in a re-usable base library and a Robot Operating System (ROS) ros_control integration is provided on top of that. The implementation was tested with the KUKA LBR iiwa robot and the Franka Emika Robot (Panda) both in simulation and with the physical robots.

translated by 谷歌翻译

Comparison and Evaluation of Methods for a Predict+Optimize Problem in Renewable Energy

Christoph Bergmeir , Frits de Nijs , Abishek Sriramulu , Mahdi Abolghasemi , Richard Bean , John Betts , Quang Bui , Nam Trong Dinh , Nils Einecke , Rasul Esmaeilbeigi

分类：人工智能

2022-12-21

Algorithms that involve both forecasting and optimization are at the core of solutions to many difficult real-world problems, such as in supply chains (inventory optimization), traffic, and in the transition towards carbon-free energy generation in battery/load/production scheduling in sustainable energy systems. Typically, in these scenarios we want to solve an optimization problem that depends on unknown future values, which therefore need to be forecast. As both forecasting and optimization are difficult problems in their own right, relatively few research has been done in this area. This paper presents the findings of the ``IEEE-CIS Technical Challenge on Predict+Optimize for Renewable Energy Scheduling," held in 2021. We present a comparison and evaluation of the seven highest-ranked solutions in the competition, to provide researchers with a benchmark problem and to establish the state of the art for this benchmark, with the aim to foster and facilitate research in this area. The competition used data from the Monash Microgrid, as well as weather data and energy market data. It then focused on two main challenges: forecasting renewable energy production and demand, and obtaining an optimal schedule for the activities (lectures) and on-site batteries that lead to the lowest cost of energy. The most accurate forecasts were obtained by gradient-boosted tree and random forest models, and optimization was mostly performed using mixed integer linear and quadratic programming. The winning method predicted different scenarios and optimized over all scenarios jointly using a sample average approximation method.

translated by 谷歌翻译

DePlot: One-shot visual language reasoning by plot-to-table translation

Fangyu Liu , Julian Martin Eisenschlos , Francesco Piccinno , Syrine Krichene , Chenxi Pang , Kenton Lee , Mandar Joshi , Wenhu Chen , Nigel Collier , Yasemin Altun

分类：自然语言处理 | 人工智能 | 计算机视觉

2022-12-20

Visual language such as charts and plots is ubiquitous in the human world. Comprehending plots and charts requires strong reasoning skills. Prior state-of-the-art (SOTA) models require at least tens of thousands of training examples and their reasoning capabilities are still much limited, especially on complex human-written queries. This paper presents the first one-shot solution to visual language reasoning. We decompose the challenge of visual language reasoning into two steps: (1) plot-to-text translation, and (2) reasoning over the translated text. The key in this method is a modality conversion module, named as DePlot, which translates the image of a plot or chart to a linearized table. The output of DePlot can then be directly used to prompt a pretrained large language model (LLM), exploiting the few-shot reasoning capabilities of LLMs. To obtain DePlot, we standardize the plot-to-table task by establishing unified task formats and metrics, and train DePlot end-to-end on this task. DePlot can then be used off-the-shelf together with LLMs in a plug-and-play fashion. Compared with a SOTA model finetuned on more than >28k data points, DePlot+LLM with just one-shot prompting achieves a 24.0% improvement over finetuned SOTA on human-written queries from the task of chart QA.

translated by 谷歌翻译